NxRepair: error correction in de novo sequence assembly using Nextera mate pairs
نویسندگان
چکیده
Scaffolding errors and incorrect repeat disambiguation during de novo assembly can result in large scale misassemblies in draft genomes. Nextera mate pair sequencing data provide additional information to resolve assembly ambiguities during scaffolding. Here, we introduce NxRepair, an open source toolkit for error correction in de novo assemblies that uses Nextera mate pair libraries to identify and correct large-scale errors. We show that NxRepair can identify and correct large scaffolding errors, without use of a reference sequence, resulting in quantitative improvements in the assembly quality. NxRepair can be downloaded from GitHub or PyPI, the Python Package Index; a tutorial and user documentation are also available.
منابع مشابه
Choosing a Benchtop Sequencing Machine to Characterise Helicobacter pylori Genomes
The fully annotated genome sequence of the European strain, 26695 was first published in 1997 and, in 1999, it was directly compared to the USA isolate J99, promoting two standard laboratory isolates for Helicobacter pylori (H. pylori) research. With the genomic scaffolds available from these important genomes and the advent of benchtop high-throughput sequencing technology, a bacterial genome ...
متن کاملDe novo fragment assembly with short mate-paired reads: Does the read length matter?
Increasing read length is currently viewed as the crucial condition for fragment assembly with next-generation sequencing technologies. However, introducing mate-paired reads (separated by a gap of length, GapLength) opens a possibility to transform short mate-pairs into long mate-reads of length approximately GapLength, and thus raises the question as to whether the read length (as opposed to ...
متن کاملRobust Error Correction for De Novo Assembly via Spectral Partitioning and Sequence Alignment
Error correction is the first step for any de novo assembly using next generation sequencing (NGS) data. This task is quite difficult and most available error correction software only supports base mismatches. In this work we propose a novel approach based on spectral graph clustering and Smith-Waterman alignment. This approach not only supports insertions and deletions, but also do not make an...
متن کاملPaired de Bruijn Graphs: A Novel Approach for Incorporating Mate Pair Information into Genome Assemblers
The recent proliferation of next generation sequencing with short reads has enabled many new experimental opportunities but, at the same time, has raised formidable computational challenges in genome assembly. One of the key advances that has led to an improvement in contig lengths has been mate pairs, which facilitate the assembly of repeating regions. Mate pairs have been algorithmically inco...
متن کاملAn Integrated Pipeline for de Novo Assembly of Microbial Genomes
Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi-step process, involving several stages of sequence data cleaning, error correction, assembly, an...
متن کامل